Optimizing SMT processors for IP-packet processing

نویسندگان

  • Behnam Robatmili
  • Nasser Yazdani
  • Mehrdad Nourani
چکیده

Simultaneous multithreading (SMT) processors support multiple active concurrent hardware threads in order to share processor resources such as functional units and memory. This thread-level-parallelism (TLP) can be exploited in form of packet-level-parallelisms required in packet processing devices called network processors. In this paper, we propose hardware queues for scheduling process threads which causes 25% improvement in the throughput of IP-lookup threads. To optimize our target SMT network processor, we used IP-hashing to improve the overall packet throughput and reduce latency. We also propose some load-balancing mechanisms in the level of process threads. These optimizations are evaluated using our simulation environment called NPSMT, which simulates a typical SMT network processor, a network controller and a packet generator. In addition to fast and memory sensitive IP-lookup threads, we also used slow process sensitive MD5 threads in our scenarios. Considering the effect of different parameters of the processor and network controller, we discuss the performance achieved for MD5 and IP-lookup program (Netbench benchmarks) under different workloads. Our results show that IP-lookup memory sensitivity is spread into all other running threads (MD5) and the whole system is much more sensitive to memory speed rather than other processor parameters such as the number of ALUs and the ILP capacity. q 2005 Elsevier B.V. All rights reserved.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A High Performance Parallel IP Lookup Technique Using Distributed Memory Organization and ISCB-Tree Data Structure

The IP Lookup Process is a key bottleneck in routing due to the increase in routing table size, increasing traıc and migration to IPv6 addresses. The IP address lookup involves computation of the Longest Prefix Matching (LPM), which existing solutions such as BSD Radix Tries, scale poorly when traıc in the router increases or when employed for IPv6 address lookups. In this paper, we describe a ...

متن کامل

A High Performance Parallel IP Lookup Technique Using Distributed Memory Organization and ISCB-Tree Data Structure

The IP Lookup Process is a key bottleneck in routing due to the increase in routing table size, increasing traıc and migration to IPv6 addresses. The IP address lookup involves computation of the Longest Prefix Matching (LPM), which existing solutions such as BSD Radix Tries, scale poorly when traıc in the router increases or when employed for IPv6 address lookups. In this paper, we describe a ...

متن کامل

Addressing TCP/IP Processing Challenges Using the IA and IXP Processors

The majority of datacenter applications such as web services, e-commerce, storage, and firewall use Transmission Control Protocol/Internet Protocol (TCP/IP) as the data communication protocol of choice. As such, the performance of these applications is largely dependent upon the efficient processing of TCP/IP packets. In addition, with the arrival of the 10 Gigabit Ethernet, the TCP/IP packet p...

متن کامل

Optimizing TCP Receive Performance

The performance of receive side TCP processing has traditionally been dominated by the cost of the ‘per-byte’ operations, such as data copying and checksumming. We show that architectural trends in modern processors, in particular aggressive prefetching, have resulted in a fundamental shift in the relative overheads of per-byte and per-packet operations in TCP receive processing, making per-pac...

متن کامل

Code and Data Transformations for Improving Shared Cache Performance on SMT Processors

Simultaneous multithreaded processors use shared on-chip caches, which yield better cost-performance ratios. Sharing a cache between simultaneously executing threads causes excessive conflict misses. This paper proposes software solutions for dynamically partitioning the shared cache of an SMT processor, via the use of three methods originating in the optimizing compilers literature: dynamic ti...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • Microprocessors and Microsystems

دوره 29  شماره 

صفحات  -

تاریخ انتشار 2005